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Abstract 


We use a large single particle tracking data set to analyze the short time and small 
spatial scale motion of quantum dots labeling proteins in cell membranes. Our analysis 
focuses on the jumps which are the changes in the position of the quantum dots between 
frames in a movie of their motion. Previously we have shown that the directions of 
the jumps are uniformly distributed and the jump lengths can be characterized by a 
double power law distribution. 

Here we show that the jumps over a small number of time steps can be described 
by scalings of a single double power law distribution. This provides additional strong 
evidence that the double power law provides an accnrate description of the fine scale 
motion. This more extensive analysis provides strong evidence that the double power 
law is a novel stable distribution for the motion. This analysis provides strong evidence 
that an earlier result that the motion can be modeled as diffusion in a space of fractional 
dimension roughly 3/2 is correct. The form of the power law distribution quantifies 
the excess of short jumps in the data and provides an accurate characterization of the 
fine scale diffusion and, in fact, this distribution gives an accurate description of the 
jump lengths up to a few hundred nanometers. Our results complement of the nsual 
mean squared displacement analysis used to study diffusion at larger scales where the 
proteins are more likely to strongly interact with larger membrane structures. 

Keywords: Single particle tracking, Protein motion, Jump probability distribution, Sta¬ 
ble distribution, 
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1 Introduction 


Our goal is to better understand the fine scale motion of proteins in cell membranes. We 
do this by studying a large single particle tracking data set for the IgE high affinity re¬ 
ceptor FceRI tagged with quantum dots (QDs) which were taken from movies of 

the positions of the QDs. Standard time series analysis requires the analysis of the jumps 
which are the changes in the positions of the QDs between frames in the movie as was done 
in [iniET]. The components of the jumps in a Brownian random walk in two dimensions are 
normally distributed if and only if the directions of the jumps are uniformly distributed and 
the lengths of the jumps have a chi, or equivalently, Weibull distribution. A main result of 
the previous studies is that the directions of the jumps are uniformly distributed, just as in 
Brownian motion, but the jump lengths do not have a chi distribution. Instead the studies 
provide evidence that the jumps can be described using a double power law distribution. It 
was also shown in [10], that a double power law fits the jump data better than general chi 
or general Weibull distributions. Our power law distribution has two power law behaviors, 
most importantly one for small jumps and another less important power law behavior for 
large jumps. 

Here we study the jumps over a few time steps, not just one time step, and provide 
much stronger evidence that the previous descriptions of the motion are accurate. It came 
as a surprise to us that the double power laws for multiple time steps are related to the one 
time step power law by a square root of time scaling law, just as in Brownian motion. This 
allows us to characterize the diffusion on short time and small spatial scales using a single 
double power law probability distribution function with scaling and is the key to providing 
stronger evidence for the previous characterization of the motion. Actually the double power 
law is surprisingly accurate for moderate spatial and temporal scales, does not have a heavy 
tail mm and, in fact, has many finite moments. 

An important point is that the measured motion of the quantum dots has several com¬ 
ponents one of which is the actual motion of the protein. These include movement of the 
dot while the image is being taken, noise in the imaging equipment, and the super resolu¬ 
tion imaging process. These errors have been studied extensively in the context of MSD 
analysis, see e.g. |ll[9l[TTl[T5l[T6l[T8ll2lll23l[2l]- However, it is reasonable to assume that all 
these random processes are independent, and all but the underlying motion of the protein 
are normally distributed. If the underlying motion of the protein was normally distributed, 
then the measured motion would also be normally distributed, but it is not. The most likely 
and cause of the double power law behavior is the motion of the proteins. 

This work has important implications for modeling protein motion. The Central Limit 
Theorem implies that if we could model our data as an random walk in a homogeneous 
medium that has independent and identically distributed (HD) jumps with double power 
law distributed jump lengths, then the components of the jumps over many time steps 
should approach a normal distribution, or equivalently, the jump angles must be uniformly 
distributed and the jump lengths must have a chi distribution. We can view our data as 
sampling the motion after the protein has made many smaller time steps. If the motion 


4 


could be described as an IID random walk then jump lengths must have a chi distribution, 
which they do not have. We view this to mean that the diffusion of proteins in the cell 
membrane is a complex process that cannot be accurately modeled using an IID random 
walk in a homogeneous medium. We note that previous work [10] supports the assumption 
that the jumps are independent. Collaborators have recently made simulations of random 
walks in non-homogeneous mediums that have statistics similar to our data [7|. 

To understand why we view the motion as diffusion in a space of dimension 3/2, one 
needs to know that in dimension n, IID random walks where the components of the motion 
are normally distributed with mean zero and standard deviation a, the motion can also 
be characterized as uniform on the unit sphere and radially by the chi distribution with n 
degrees of freedom and scale factor s, c{r,s,n) = c{r/s,n)/s where [10] 


c(r, n) 


2 

2V2r(n/2) 



( 1 . 1 ) 


r>0,ns‘^ = a^>0 and n > 1. In two dimensions this is known as the chi or Weibull 
distribution while in dimension three it is the Maxwell-Boltzmann distribution. For small r, 
this distribution has the from 

c(r, n) ~ C r "‘~^, 

where C is a constant. Consequently, for the fine scale motion, if a probability distribution 
function for the lengths of jumps has the form 


p{r) K, C r 


d-l 


for small r and where C is a constant, then we say that the motion can be modeled as being 
in a space of dimension d. For our data, d ~ 3/2. This is a quantitative measure of the 
restrictions on the motion of the protein on small spatial and temporal scales. 

Our results complement those obtained by mean squared displacement analysis which 
involves using data from more time steps than used here and consequently are appropriate 
to temporal and spatial scales where the proteins may have strong interactions with other 
membrane structures [Tll^lGl fT^fTSl 120112^126] . 

The biology and microscopy involved in creating the data are described in detail in [2113] , 
so we will not repeat this here. We will use the same notation as in ra. which we now 
summarize. 


1.1 Random Variables 

We model the QDs positions using vector valued random variables: 

Pn = (X„,Y„), l<n<N, N>0, 

where X.„ and Y„ are real valued random variables and N and n are integers. The jumps 
are also random variables: 

Jn = Pn- Pn-1 = (AX„, AY„) , 2 <n< N . 
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k = 1 

k = 2 

k = 3 

A 

B 

407,669 

353,368 

346,750 

300,517 

302,256 

261,474 


Table 1.1; The total number of jumps for 1,2, and 3 time steps that are in data sets A and 


In polar coordinates, the lengths of the jumps L„ and the angles ©„ between the jump 
vectors and the x-axis are also real valued random variables: 

Ln = II Jnll = )/ AX^ + AY^ , 0„ = arctan(AY„, AX„), 2<n<N, 

where arctan gives a value in (—tt, tt] such that if L„ 7 ^ 0, then cos(©n) = AX„/L„ and 
sin(©„) = AY„/L„, and consequently, tan(©„) = AY„/AX„ if AX„ 7 ^ 0. If J = (0,0), 
then © = 0 (in matlab). 

1.2 Tracking Data 

The analyzed data were taken from unstimulated cells where a subset of the FceRI on the 
cell membrane were labeled with QDs. We studied two large data sets called A and B. We 
analyzed each data set individually as it is useful to see the differences between the two sets 
but for our new results we combine the data sets to obtain better accuracy. The data contain 
M > 0 tracks with Y > 0 positions described by 

^m,n ) ym,n ) '^m,n ) 1 ^ ^ ^ ) 1 ^ ^ ^ ^ ■ 

The vectors 

^ m,n ym,n) 

estimate the position of the QDs. If Vm,n = 1, then the QD is on and the position of the QD 
is valid data, while if Vm,n = 0, the QD is off. 

The length of a valid jump of over k times steps is 

Lm,n,k II ^ m,n+k ^ m,n|| ) 1 ^ ^ ^ ^ ) 1 ^ ^ ^ ^ k . 

The jump is valid provided that the QD is on at all times n through n + k, that is 

Vm,n * Vm,n+1 * ' ' ' * Vm,n+k = 1, l<m<M, I <n < N - k . 

The number of valid jumps in data sets A and B is given in Table fTTTl These are large data 

sets. 
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2 Analyzing the Data 


CDFs For Data Set A CDFs For Data Set B 




(a) Data set A (b) Data set B 

Figure 2.1: The CDFs for data sets A and B 




(a) Data set A (b) Data set B 

Figure 2.2: The PDFs for data sets A and B 

Our hrst step in analyzing the data is to plot the cumulative distribution functions 
(CDFs) and probability distribution functions (PDFs) of the jump lengths which are shown 
in Figures 12.11 and 12.21 The CDF is computed by first sorting the lengths into increasing 
size. Assuming that there are / > 0 lengths Tj, 1 < z < / then increasing size means that 
< D+ 1 , 1 < f < / — 1. The CDFs shown in Figure [2T] are determined by the pairs (r*, i/1). 
This method of determining the CDFs is helpful as it makes use of all of the data without 
any averaging like that which occurs when binning the data. Next we hnd the PDFs shown 
in Figure [2]2] by the standard method of binning the data using 200 bins, thus averaging over 
about 1/2% of the data for each value of the PDF. These plots use all of the valid jumps 
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in each of the data sets and n is the number of time steps determining the jumps. Observe 
that as the number of time steps increase, the height of the PDFs decreases while the width 
increases, suggesting that the PDFs are related by a scaling. As can be seen in Figure 12.21 
and noted in ng, for the single time step jumps (n = 1), the data for r > 346nm have 
significant errors and consequently are not used in our analysis. As we are most interested 
in short jumps, this does not cause a problem. 

2.1 Scaling the Data 




(a) Data set A (b) Data set B 

Figure 2.3: The scaled CDFs for all data in the data sets A and B 


xlQ-® Scaled PDFs of data set A xlO”* Scaled PDFs of data set B 




(a) Data set A 


(b) Data set B 


Figure 2.4: The scaled PDFs for all data in the data sets A and B 























If the CDF and PDF are smooth then they are related by 


p{r) 


dP{r) 

dr 


If s > 0 is to be used as a scaling factor, then the scaled CDF and PDF are given by 


P 



and -p(- 
s \s 


We tried scaling the data for 1, 2 and 3 time step jumps by vT = 1, \/2 and \/3 which 
is done by dividing the jumps by the scale factor. The results of the scaling are shown in 
Figures 12.31 and fTM It is a surprise that the scaled CDFs and PDFs are so similar and the 
scaling property is the same as for Brownian motion. Thus, even though the components of 
the jumps are not normally distributed, the angles of the jumps can be modeled as uniformly 
distributed and the jump lengths can be modeled by a single PDF or CDF that is scaled by 
^/i. We now turn to quantifying this idea. 

Because of the errors in the single time step jumps for for r > 346nm we only analyze the 
scaled data for r < 346nm which corresponds to the unsealed jumps satisfying r < 346nm 
when n = 1, r < 489nm when n = 2 and r < 599nm when n = 3. However, we plot 
our results for scaled data satisfying r < SOOnm which scales to r < SOOnm when n = 1, 
r < 707nm when n = 2 and r < 866nm when n = 3 to emphasize that are results are quite 
good even for large jumps 
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3 Fitting the Data 


We begin by describing the double power law distribution and it’s scaling law. We then 
show how to generate random numbers having the double power law probability distribution 
function and finish by using the double power law to £t the jump data. 

3.1 The double power law distribution 

In order for the parameters to have a more intuitive meaning we will change the notation 
from that in [10]. The functional form is chosen in a way that the PDF p{r) is a power law 
for both small and large r, which is substantially more restrictive than having only one of 
these conditions hold. Additionally, we required that the PDF has a simple CDF so that it 
is easy to simulate random numbers with the given PDF. The result is our double power law. 
For small r > 0, we require 


p(r) Cl r" ^, a > 1, 


because then a corresponds to the dimension of the space (see fll.ip ) in which the diffusion 
is occurring. For large r we want the PDF to decay rapidly, so we choose 



The condition on fd guarantees that the PDF has a finite integral. We find a good choice for 
the PDF p and corresponding CDF P is 



Also, to make the values of the parameters easy to interpret, we need to find a scale 


factor S so that the second moment of the scaled power law will be one. The first and 
second moments of any distribution p are: 



For the power law distribution, integration gives 



If we set 


pl{r) = S p{Sr ), PL{r) = P{Sr) , 


(3.1) 


then 
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and the first and second moments of this distribntion are 


Conseqnently, if 
then 


Ml = — , M2 = ^. 


S = vW, 


M 2 = 1. 

The required double power law distributions are then given by fld.ip . 

3.2 Simulating the double power law distribution 


(3.2) 


Power Law Generated Random Numbers, 
n =1000000 


Power Law Generated Random Numbers, 
n =1000000 




Figure 3.1: 
identical. 


(b) The PDF 

The analytic and simulated pi and PL distributions with a = 1 are essentially 


As is well known, if r is a uniformly distributed random number in [0,1], then solving 
F(r) = u will produce random number with the given CDF P and associated PDF p. 
Therefore, we can generate random number with the pi distribution by solving 

PL{r) = u 

where u is uniformly distributed. This gives 
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a 

/S 

s(nm) 

residual 

PDF 

1.49 

39.47 

148.52 

1.9e-08 

CDF 

1.49 

256.86 

145.73 

6.8e-06 


Table 3.1: The values a, /9, s and the residual for the hts. 


where S is given in fl3.2p . Because u is a uniformly distributed random number in [0,1], then 
so in 1 — u so this can be simplihed to 

r = i + . 

o 

We numerically checked that this distribution has <7 = 1 and plot the comparisons of the 
simulated and analytic PDF and CDF in Figure I3.11 Double power law random numbers 
with second moment are given by ar. 

3.3 Fitting the PDF and CDF 




(a) CDF (b) PDF 

Figure 3.2: Plots of the hts of both the PDF and CDF of the data. The vertical red line 
indicates the cut off used for htting the data. 


We now use least-squares data htting to hnd a double exponential power law that hts 
the jump data, that is, we must estimate the parameters a and {3 and the scale parameter 
s. To obtain maximal accuracy for the parameters, we combine all the scaled jumps sizes 
from data sets A and B and scaled jumps sizes for 1, 2 and 3 time steps. As noted above, 
and can be seen in Figure 13.2b . the data for r > 346nm have signihcant errors, so we only 
ht jumps for r < 346nm which gives 1,971,537 values that can be used for the hts, which are 
excellent for r < 346nm. 
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In fact, we fit both the PDF and CDF independently to obtain 

«CDF , /^CDF , SCDF , 

«PDF , /^PDF , 5pDF • 

But the parameters of the ht of the PDF can be used in the £t to the CDF and conversely. 
As is common with fitting problems, the solution is not unique, as shown in Table 13.11 but 
the graphs of the fits are essentially the same. We consider the fit with k, 4Q the simplest 
and thus the better fit. We plot both of these fits in Figure [321 We are most interested in 
the parameter a which describes the short jumps and is unique. Also, the square root of 
the second moment of all of the scaled jumps is 143.85nm and indeed the scale factors, as 
expected, are close to this this value. 
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4 Results and Discussion 


We have shown that the small spatial and short time motion of QDs labeling proteins on 
the cell membrane can be characterized by a single donble power law and sqnare root of 
time scaling. In fact, this distribntion is a good ht for all of the scaled jnmps nnder 340nm. 
This power law qnantihes the idea that this motion has snbstantially more short jnmps than 
if the motion was a Brownian random walk. Another way to say this is that the motion 
can be viewed as diffusion in a space of dimension 3/2, which also qnantihes the small scale 
restrictions of the motion. We note that this notion of diffusion seems not to be closely 
related to diffusion on fractals [19] where the MSD was used to characterize the motion. 
On the other hand, the fact that the jumps over a few time steps are all described by one 
probability distribution and a scale factor suggests that this distribution is stable [HI and 
thus captures important properties of the motion. 

To accuratly study the interaction of molecules on a cell membrane one can use a stochas¬ 
tic simulator where the time step is chosen so the the proteins typically move only a few 
nanometers. Our results can be extend to provide the probabilty that two proteins will 
interact in time intervals where the proteins move a few hundred nanometers, providing a 
tool for signihcantly reducing the simulation cost. 

We note that the Central Limit Theorem puts a strong restrictions on modeling the 
motion. To conhrm this, we used our double power law random number generator combined 
with uniformly generated angles to simulate random walks. For a 100 walkers going 10 time 
steps, the distribution of the distance moved over the 10 steps is now normally distributed as 
predicted by the Central Limit Theorem. This implies that the motion cannot be accurately 
modeled as a IID random walk in a homogeneous medium. 

Researchers associated with the New Mexico Center for the Spatiotemporal Modeling 
of Cell Signaling have created models of the cell membrane that include models of actin 
hlament barriers and lipid rafts where the jump data has properties similar to the jump data 
analyzed here. It is hoped that these models will provide a connection between the statistics 
of the membrane structure and the resulting PDF of the jump sizes. 

Our results complement recent results on the analysis of the motion of proteins on longer 
time scales using mean squared displacement ideas. For example: [I] uses k-space image 
correlation spectroscopy analysis to study single molecule density data of lipids and pro¬ 
teins labeled with quantum dots; [T8| uses variational Bayesian treatment of hidden Markov 
models to analyze many short tracks identifying several diffusion states; [T3| uses noner- 
godic motion, fractal structures, and multifractional random motion to study the motion of 
proteins; [U] uses Basian statistics to better quantify noise from sampling limitations and 
biological heterogeneity; [25| shows that both ergodic and a nonergodic process exist in the 
plasma membrane and that the ergodic process resembles a fractal structure originating in 
the macromolecular crowding in the cell membrane; [9] uses hidden Markov models are used 
to identify multiple states of diffusion within experimental trajectories. 
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A The fits of the CDFs and PDFs 

In this section we show how well the computed PDF power laws ht the unsealed data for 1, 
2 and 3 jumps and data sets A and B. Again we see that the fits are excellent, but now the 
noise in the data is more apparent. 




(a) 1 jump, data set A 


(b) 1 jump, data set B 



(c) 2 jumps, data set A 



(d) 2 jumps, data set B 




(e) 3 jumps, data set A 


(f) 3 jumps, data set B 


Figure A.l; The PDFs fits for data sets A and B and 1, 2, and 3 jumps. 
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